Q-learning: Flexible learning about useful utilities
نویسندگان
چکیده
Dynamic treatment regimes are fast becoming an important part of medicine, with the corresponding change in emphasis from treatment of the disease to treatment of the individual patient. Because of the limited number of trials to evaluate personally tailored treatment sequences, inferring optimal treatment regimes from observational data has increased importance. Q-learning is a popular method for estimating the optimal treatment regime, originally in randomized trials but more recently also in observational data. Previous applications of Q-learning have largely been restricted to continuous utility end-points with linear relationships. This paper is the first both to extend the framework to discrete utilities and to implement the modelling of covariates from linear to more flexible modelling using the generalized additive model (GAM) framework. Simulated data results show that the GAM adapted Q-learning typically outperforms Q-learning with linear models and other frequently-used methods based on propensity scores in terms of coverage and bias/MSE. This represents a promising step towards a more fully general Q-learning approach to estimating optimal dynamic treatment regimes. This article is in technical report form, the final publication is available at http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s12561-0139103-z .
منابع مشابه
An Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources
This paper presents an online two-stage Q-learning based multi-agent (MA) controller for load frequency control (LFC) in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs). The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO...
متن کاملEvolving subjective utilities: Prisoner's Dilemma game examples
We have proposed the utility-based Q-learning concept that supposes an agent internally has an emotional mechanism that derives subjective utilities from objective rewards and the agent uses the utilities as rewards of Q-learning. We have also proposed such an emotional mechanism that facilitates cooperative actions in Prisoner’s Dilemma (PD) games. However, this mechanism has been designed and...
متن کاملA Q-learning Based Continuous Tuning of Fuzzy Wall Tracking
A simple easy to implement algorithm is proposed to address wall tracking task of an autonomous robot. The robot should navigate in unknown environments, find the nearest wall, and track it solely based on locally sensed data. The proposed method benefits from coupling fuzzy logic and Q-learning to meet requirements of autonomous navigations. Fuzzy if-then rules provide a reliable decision maki...
متن کاملNursing students’ attitude about factors influencing clinical learning in Medical University of Guilan
Introduction: Education is a regular process in order to aid individuals for acquiring ‎knowledge and new skills. Education is an active interaction between educator and learner. ‎Learning is a stable change process in individual’s potential behavior. Therefore, we can only ‎say that the students’ learning is satisfactory when learning causes proper behavioral changes &l...
متن کاملDynamic Joint Action Perception for Q-Learning Agents
Q-Iearning is a reinforcement learning alg()rithm that learns expected utilities for stateaction transitions through successive interactions with the environment The algorithm '5 simplicity as well as its convergence properties have made it a popular algorithm for study However; its non-parametric representation of utilities limits its effectiveness in environments with large amounts of percept...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013